AITopics | deep relu network

Posterior Concentration for Sparse Deep Learning

Neural Information Processing SystemsMar-16-2026, 20:56:33 GMT

We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Posterior Concentration for Sparse Deep Learning

Veronika Rockova, nicholas polson

Neural Information Processing SystemsFeb-12-2026, 21:37:37 GMT

Neural Information Processing Systems http://nips.cc/

deep learning, deep relu network, neural network, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)

Add feedback

979a3f14bae523dc5101c52120c535e9-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 11:14:42 GMT

activation function, approximation, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

Neural Information Processing SystemsDec-26-2025, 04:36:50 GMT

Deep neural networks have revolutionized many real world applications, due to their flexibility in data fitting and accurate predictions for unseen data. A line of research reveals that neural networks can approximate certain classes of functions with an arbitrary accuracy, while the size of the network scales exponentially with respect to the data dimension. Empirical results, however, suggest that networks of moderate size already yield appealing performance. To explain such a gap, a common belief is that many data sets exhibit low dimensional structures, and can be modeled as samples near a low dimensional manifold. In this paper, we prove that neural networks can efficiently approximate functions supported on low dimensional manifolds.

deep relu network, efficient approximation, neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Neural Information Processing SystemsDec-25-2025, 18:27:10 GMT

The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks. In ReLU networks, the number of activation patterns is one measure of expressivity; and the maximum number of patterns grows exponentially with the depth. However, recent work has showed that the practical expressivity of deep networks - the functions they can learn rather than express - is often far from the theoretical maximum. In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns. Our work suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods.

deep network, deep relu network, expressivity, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Precise characterization of the prior predictive distribution of deep ReLU networks

Neural Information Processing SystemsDec-24-2025, 17:52:05 GMT

Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture. Similar in spirit to the kind of analysis that has been developed to devise better initialization schemes for neural networks (cf. He-or Xavier initialization), we derive a precise characterization of the prior predictive distribution of finite-width ReLU networks with Gaussian weights.While theoretical results have been obtained for their heavy-tailedness,the full characterization of the prior predictive distribution (i.e. its density, CDF and moments), remained unknown prior to this work. Our analysis, based on the Meijer-G function, allows us to quantify the influence of architectural choices such as the width or depth of the network on the resulting shape of the prior predictive distribution. We also formally connect our results to previous work in the infinite width setting, demonstrating that the moments of the distribution converge to those of a normal log-normal mixture in the infinite depth limit. Finally, our results provide valuable guidance on prior design: for instance, controlling the predictive variance with depth-and width-informed priors on the weights of the network.

characterization, precise characterization, predictive distribution, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Posterior Concentration for Sparse Deep Learning

Neural Information Processing SystemsNov-20-2025, 22:16:44 GMT

We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time.

name change, posterior concentration, sparse deep learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

Add feedback

Posterior Concentration for Sparse Deep Learning

Veronika Rockova, nicholas polson

Neural Information Processing SystemsNov-20-2025, 16:38:41 GMT

Deep learning constructs are powerful tools for pattern matching and prediction.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)

Add feedback

The phase diagram of approximation rates for deep neural networks

Neural Information Processing SystemsAug-15-2025, 06:44:02 GMT

Moreover, we show that all networks with a piecewise polynomial activation function have the same phase diagram.

activation function, approximation, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Feature Attribution from First Principles

Taimeskhanov, Magamed, Garreau, Damien

arXiv.org Artificial IntelligenceJun-2-2025

Feature attribution methods are a popular approach to explain the behavior of machine learning models. They assign importance scores to each input feature, quantifying their influence on the model's prediction. However, evaluating these methods empirically remains a significant challenge. To bypass this shortcoming, several prior works have proposed axiomatic frameworks that any feature attribution method should satisfy. In this work, we argue that such axioms are often too restrictive, and propose in response a new feature attribution framework, built from the ground up. Rather than imposing axioms, we start by defining attributions for the simplest possible models, i.e., indicator functions, and use these as building blocks for more complex models. We then show that one recovers several existing attribution methods, depending on the choice of atomic attribution. Subsequently, we derive closed-form expressions for attribution of deep ReLU networks, and take a step toward the optimization of evaluation metrics with respect to feature attributions.

artificial intelligence, attribution, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.24729

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

deep relu network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Posterior Concentration for Sparse Deep Learning

Posterior Concentration for Sparse Deep Learning

979a3f14bae523dc5101c52120c535e9-Paper.pdf

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Precise characterization of the prior predictive distribution of deep ReLU networks

Posterior Concentration for Sparse Deep Learning

Posterior Concentration for Sparse Deep Learning

The phase diagram of approximation rates for deep neural networks

Feature Attribution from First Principles